Realistic Synthetic Data for Testing Association Rule Mining Algorithms for Market Basket Databases
نویسندگان
چکیده
Association rule mining (ARM) is an important subtask in Knowledge Discovery in Databases. Existing ARM algorithms have largely been tested using artificial data generated by the QUEST program developed by Agrawal et al. [2]. Concerns have been raised before [7, 25] on the significance of such sample data. We provide the first theoretical investigation of the statistical properties of the databases generated by the QUEST program. Motivated by the claim (supported by empirical evidence) that item occurrences in real life market basket databases follow a rather different pattern, we then propose an alternative model for generating artificial data. We claim that such a model is simpler than QUEST and generates structures that are closer to real-life market basket data.
منابع مشابه
On the Optimality of Association-rule Mining Algorithms
Since its introduction close to a decade ago, the problem of efficient mining of association rules on market-basket data has attracted tremendous attention. Numerous algorithms have been proposed, each one in turn claiming to outperform its predecessors on a representative set of databases. In this paper, we first focus our attention on the question of how much space remains for performance imp...
متن کاملRDB-MINER: A SQL-Based Algorithm for Mining True Relational Databases
Traditionally, research in the area of frequent itemset mining has focused on mining market basket data. Several algorithms and techniques have been introduced in the literature for mining data represented in basket data format. The primary objective of these algorithms has been to improve the performance of the mining process. Unlike basket data representation, no algorithms exist for mining f...
متن کاملOn-Line Analytical Mining of Association Rules
With wide applications of computers and automated data collection tools, massive amounts of data have been continuously collected and stored in databases, which creates an imminent need and great opportunities for mining interesting knowledge from data. Association rule mining is one kind of data mining techniques which discovers strong association or correlation relationships among data. The d...
متن کاملA Pragmatic Approach on Association Rule Mining and its Effective Utilization in Large Databases
This paper deals with the effective utilization of association rule mining algorithms in large databases used for especially business organizations where the amount of transactions and items plays a crucial role for decision making. Frequent item-set generation and the creation of strong association rules from the frequent item-set patterns are the two basic steps in association rule mining. We...
متن کاملNumeric Multi-Objective Rule Mining Using Simulated Annealing Algorithm
Abstract as a single objective one. Measures like support, confidence and other interestingness criteria which are used for evaluating a rule, can be thought of as different objectives of association rule mining problem. Support count is the number of records, which satisfies all the conditions that exist in the rule. This objective represents the accuracy of the rules extracted from the da...
متن کامل